Protein complexes in cells by AI‐assisted structural proteomics

Abstract Accurately modeling the structures of proteins and their complexes using artificial intelligence is revolutionizing molecular biology. Experimental data enable a candidate‐based approach to systematically model novel protein assemblies. Here, we use a combination of in‐cell crosslinking mass spectrometry and co‐fractionation mass spectrometry (CoFrac‐MS) to identify protein–protein interactions in the model Gram‐positive bacterium Bacillus subtilis. We show that crosslinking interactions prior to cell lysis reveals protein interactions that are often lost upon cell lysis. We predict the structures of these protein interactions and others in the SubtiWiki database with AlphaFold‐Multimer and, after controlling for the false‐positive rate of the predictions, we propose novel structural models of 153 dimeric and 14 trimeric protein assemblies. Crosslinking MS data independently validates the AlphaFold predictions and scoring. We report and validate novel interactors of central cellular machineries that include the ribosome, RNA polymerase, and pyruvate dehydrogenase, assigning function to several uncharacterized proteins. Our approach uncovers protein–protein interactions inside intact cells, provides structural insight into their interaction interfaces, and is applicable to genetically intractable organisms, including pathogenic bacteria.


Appendix
: Summary of crosslinking MS results.
A-Cells were crosslinked with DSSO and the crosslinked proteome was separated into pellet and supernatant after lysis with a 20,000 x g centrifugation. Both were digested using Trypsin, the peptides were fractionated and Datasets 1 and 2 were acquired. The supernatant was also fractionated by size exclusion chromatography and high-molecular weight fractions were digested, the peptides fractionated and acquired as Dataset 3. B-The median intensity of proteins identified by shotgun proteomics is 1.8 x 10 7 , but for proteins with identified intraprotein or inter-protein crosslinks it is 6.4 x 10 7 and 2.7 x 10 8 respectively. C-Previously reported PPIs tend to have higher numbers of unique crosslinked residue pairs, suggesting higher abundance of these interactions/protein assemblies in the cell. D-PPIs identified at 2% PPI-level FDR (interactions to seven abundant and highly crosslinked proteins are removed Appendix Figure S2: crosslinks mapped onto structures of known complexes. Left: Mapping of crosslinking MS data on known complexes. Ribosome and RNA polymerase are represented by their experimental structures (Newing et al., 2020;Sohmen et al., 2015), while ATP synthase and DNA gyrase are modeled onto structures with high sequence identity (Sobti et al., 2016;Vanden Broeck et al., 2019). Satisfied crosslinks (<30 Å Cα-Cα) in blue, violated crosslinks in red. Crosslinks observed in ribosomes in situ include polysome contacts and contacts made in ribosome assembly intermediates. Right: distance distributions for self and heteromeric crosslinks in the structures on the left. The corresponding random distributions are overlaid in gray.

Appendix Figure S3: Characterization of YugI and YabR ribosome interactions.
A-Crosslinking MS network of YabR and YugI. Shaded area corresponds to S1-type RNAbinding domains in YabR and YugI. B-AlphaFold predictions of YabR and YugI colored by pLDDT with crosslinks shown. Crosslinking MS shows the high degree of flexibility of the Cterminal helices of the two proteins, which has low pLDDT score. Satisfied crosslinks (<30 Å Cα-Cα) in blue, violated crosslinks in red. C-Absorbance traces and western blot images relating from sucrose gradient separation of ribosomes from wild type strain 168 and strains carrying either YabR-His or YugI-His (Fig. 1C). D-Bacterial two hybrid experiment to identify interactions between YabR and the ribosomal proteins S6 and S18 as well as YugI and S3 and S10. All proteins of interest were fused to the T18 and T25 domains of the adenylate cyclase CyaA and interactions were tested in E. coli BTH101. Colonies turn dark as a result of protein interaction which enables adenylate cyclase activity and subsequently expression of the ß-galactosidase. A leucine zipper was used as a positive control. E-Resistance to tetracycline. Growth experiment of wild type (gray), ΔyugI (teal), ΔyabR (salmon) and ΔyugI ΔyabR (orange) in MSSM minimal medium with 0.1 mM KCl. The assay compares the growth under standard growth condition or after the addition of 5.2 mM tetracycline to the medium.
Appendix Figure S4: Effects of crosslinking on elution behavior of known protein complexes.
A-Elution profiles of core subunits of known complexes are shown as lines. The average normalized intensity across all subunits per fraction is in black, with the standard deviation in gray. Intensities are averaged across replicas. B-Comparison of the mean elution profile of the complexes from crosslinked and untreated cells. C-Overlap of candidate PPI datasets along with previously known structures from the PDB (seq. identity > 30% and Evalue < 10 -3 ). Note that SubtiWiki interactions were filtered to remove structurally characterized PPIs prior to candidate generation. Figure S7: Models of novel interactions.

Appendix
Models of high-confidence interactions (ipTM > 0.85) not previously annotated in SubtiWiki. Heteromeric crosslinks in orange, self crosslinks in gray.